183 research outputs found

    A New Approach to Clustering Biological Data Using Message Passing.

    Get PDF
    Motivation: Clustering algorithms are widely used m bioinformatics, having been applied to a range of problems from the analysis of gene expression to the building of phylogenetic trees. Biological data often describe parallel and spontaneous processes such as molecular interactions and genome evolution. To capture these features, we propose a new clustering algorithm that employs the concept of message passing. Methods: Inspired by a real-world situation in which people who have never met can form groups by exchanging messages, Message Passing Clustering (MPC) allows data objects to communicate with each other and produces clusters in parallel, thereby making the clustering process intrinsic. Other advantages of MPC over traditional clustering methods include that it is relatively straightforward to understand and implement and that it takes into account both local and global structure. We have proved that MPC shares similarity with Hierarchical Clustering (HC) but offers significantly improved performance. Results: To validate the MPC method, we analyzed 35 sets of simulated dynamic gene expression data, achieving a 95% hit rate with 639 of 674 genes correctly clustered. We also applied MPC to real data sets to build a phylogenetic tree for 34 strains from nine species of Mycobacterium and to cluster 698 genes from a yeast cell-cycle database. The results show higher classification accuracies as compared to traditional clustering methods

    Virtual CGH: an integrative approach to predict genetic abnormalities from gene expression microarray data applied in lymphoma

    Get PDF
    Background: Comparative Genomic Hybridization (CGH) is a molecular approach for detecting DNA Copy Number Alterations (CNAs) in tumor, which are among the key causes of tumorigenesis. However in the post-genomic era, most studies in cancer biology have been focusing on Gene Expression Profiling (GEP) but not CGH, and as a result, an enormous amount of GEP data had been accumulated in public databases for a wide variety of tumor types. We exploited this resource of GEP data to define possible recurrent CNAs in tumor. In addition, the CNAs identified by GEP would be more functionally relevant CNAs in the disease pathogenesis since the functional effects of CNAs can be reflected by altered gene expression. Methods: We proposed a novel computational approach, coined virtual CGH (vCGH), which employs hidden Markov models (HMMs) to predict DNA CNAs from their corresponding GEP data. vCGH was first trained on the paired GEP and CGH data generated from a sufficient number of tumor samples, and then applied to the GEP data of a new tumor sample to predict its CNAs. Results: Using cross-validation on 190 Diffuse Large B-Cell Lymphomas (DLBCL), vCGH achieved 80% sensitivity, 90% specificity and 90% accuracy for CNA prediction. The majority of the recurrent regions defined by vCGH are concordant with the experimental CGH, including gains of 1q, 2p16-p14, 3q27-q29, 6p25-p21, 7, 11q, 12 and 18q21, and losses of 6q, 8p23-p21, 9p24-p21 and 17p13 in DLBCL. In addition, vCGH predicted some recurrent functional abnormalities which were not observed in CGH, including gains of 1p, 2q and 6q and losses of 1q, 6p and 8q. Among those novel loci, 1q, 6q and 8q were significantly associated with the clinical outcomes in the DLBCL patients (p \u3c 0.05). Conclusions: We developed a novel computational approach, vCGH, to predict genome-wide genetic abnormalities from GEP data in lymphomas. vCGH can be generally applied to other types of tumors and may significantly enhance the detection of functionally important genetic abnormalities in cancer research

    A Dynamic Bayesian Network Model for Hierarchial Classification and its Application in Predicting Yeast Genes Functions

    Get PDF
    In this paper, we propose a Dynamic Naive Bayesian (DNB) network model for classifying data sets with hierarchical labels. The DNB model is built upon a Naive Bayesian (NB) network, a successful classifier for data with flattened (nonhierarchical) class labels. The problems using flattened class labels for hierarchical classification are addressed in this paper. The DNB has a top-down structure with each level of the class hierarchy modeled as a random variable. We defined augmenting operations to transform class hierarchy into a form that satisfies the probability law. We present algorithms for efficient learning and inference with the DNB model. The learning algorithm can be used to estimate the parameters of the network. The inference algorithm is designed to find the optimal classification path in the class hierarchy. The methods are tested on yeast gene expression data sets, and the classification accuracy with DNB classifier is significantly higher than it is with previous approachesā€“ flattened classification using NB classifier

    Message Passing Clustering with Stochastic Merging Based on Kernel Functions

    Get PDF
    In this paper, we propose a new Stochastic Message Passing Clustering (SMPC) algorithm for clustering biological data based on the Message Passing Clustering (MPC) algorithm, which we introduced in earlier work. MPC has shown its advantage when applied to describing parallel and spontaneous biological processes. SMPC, as a generalized version of MPC, extends the clustering algorithm from a deterministic process to a stochastic process, adding three major advantages. First, in deciding the merging cluster pair, the influences of all clusters are quantified by probabilities, estimated by kernel functions based on their relative distances. Second, the proposed algorithm property resolve the ā€œtieā€ problem, which often occurs for integer distances as in the case of protein interaction data. Third, clustering can be undone to improve the clustering performance when the algorithm detects objects which donā€™t have good probabilities inside the cluster and moves them outside. The test results on colon cancer gene-expression data show that SMPC performs better than the deterministic MPC

    Applications of Hidden Markov Models in Microarray Gene Expression Data

    Get PDF
    Hidden Markov models (HMMs) are well developed statistical models to capture hidden information from observable sequential symbols. They were first used in speech recognition in 1970s and have been successfully applied to the analysis of biological sequences since late 1980s as in finding protein secondary structure, CpG islands and families of related DNA or protein sequences [1]. In a HMM, the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. In this chapter, we described two applications using HMMs to predict gene functions in yeast and DNA copy number alternations in human tumor cells, based on gene expression microarray data

    Dynamics of asynchronous random Boolean networks with asynchrony generated by stochastic processes

    Get PDF
    An asynchronous Boolean network with N nodes whose states at each time point are determined by certain parent nodes is considered. We make use of the models developed by Matache and Heidel [Matache, M.T., Heidel, J., 2005. Asynchronous random Boolean network model based on elementary cellular automata rule 126. Phys. Rev. E 71, 026232] for a constant number of parents, and Matache [Matache, M.T., 2006. Asynchronous random Boolean network model with variable number of parents based on elementary cellular automata rule 126. IJMPB 20 (8), 897ā€“923] for a varying number of parents. In both these papers the authors consider an asynchronous updating of all nodes, with asynchrony generated by various random distributions. We supplement those results by using various stochastic processes as generators for the number of nodes to be updated at each time point. In this paper we use the following stochastic processes: Poisson process, random walk, birth and death process, Brownian motion, and fractional Brownian motion. We study the dynamics of the model through sensitivity of the orbits to initial values, bifurcation diagrams, and fixed-point analysis. The dynamics of the system show that the number of nodes to be updated at each time point is of great importance, especially for the random walk, the birth and death, and the Brownian motion processes. Small or moderate values for the number of updated nodes generate order, while large values may generate chaos depending on the underlying parameters. The Poisson process generates order. With fractional Brownian motion, as the values of the Hurst parameter increase, the system exhibits order for a wider range of combinations of the underlying parameters

    Cross-platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data

    Get PDF
    Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies

    Virtual CGH: an integrative approach to predict genetic abnormalities from gene expression microarray data applied in lymphoma

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative Genomic Hybridization (CGH) is a molecular approach for detecting DNA Copy Number Alterations (CNAs) in tumor, which are among the key causes of tumorigenesis. However in the post-genomic era, most studies in cancer biology have been focusing on Gene Expression Profiling (GEP) but not CGH, and as a result, an enormous amount of GEP data had been accumulated in public databases for a wide variety of tumor types. We exploited this resource of GEP data to define possible recurrent CNAs in tumor. In addition, the CNAs identified by GEP would be more functionally relevant CNAs in the disease pathogenesis since the functional effects of CNAs can be reflected by altered gene expression.</p> <p>Methods</p> <p>We proposed a novel computational approach, coined virtual CGH (vCGH), which employs hidden Markov models (HMMs) to predict DNA CNAs from their corresponding GEP data. vCGH was first trained on the paired GEP and CGH data generated from a sufficient number of tumor samples, and then applied to the GEP data of a new tumor sample to predict its CNAs.</p> <p>Results</p> <p>Using cross-validation on 190 Diffuse Large B-Cell Lymphomas (DLBCL), vCGH achieved 80% sensitivity, 90% specificity and 90% accuracy for CNA prediction. The majority of the recurrent regions defined by vCGH are concordant with the experimental CGH, including gains of 1q, 2p16-p14, 3q27-q29, 6p25-p21, 7, 11q, 12 and 18q21, and losses of 6q, 8p23-p21, 9p24-p21 and 17p13 in DLBCL. In addition, vCGH predicted some recurrent functional abnormalities which were not observed in CGH, including gains of 1p, 2q and 6q and losses of 1q, 6p and 8q. Among those novel loci, 1q, 6q and 8q were significantly associated with the clinical outcomes in the DLBCL patients (p < 0.05).</p> <p>Conclusions</p> <p>We developed a novel computational approach, vCGH, to predict genome-wide genetic abnormalities from GEP data in lymphomas. vCGH can be generally applied to other types of tumors and may significantly enhance the detection of functionally important genetic abnormalities in cancer research.</p

    Genome wide transcriptional analysis of resting and IL2 activated human natural killer cells: gene expression signatures indicative of novel molecular signaling pathways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Human natural killer (NK) cells are the key contributors of innate immune response and the effector functions of these cells are enhanced by cytokines such as interleukine 2 (IL2). We utilized genome-wide transcriptional profiling to identify gene expression signatures and pathways in resting and IL2 activated NK cell isolated from peripheral blood of healthy donors.</p> <p>Results</p> <p>Gene expression profiling of resting NK cells showed high expression of a number of cytotoxic factors, cytokines, chemokines and inhibitory and activating surface NK receptors. Resting NK cells expressed many genes associated with cellular quiescence and also appeared to have an active TGFĪ² (TGFB1) signaling pathway. IL2 stimulation induced rapid downregulation of quiescence associated genes and upregulation of genes associated with cell cycle progression and proliferation. Numerous genes that may enhance immune function and responsiveness including activating receptors (<it>DNAM1, KLRC1 </it>and <it>KLRC3</it>), death receptor ligand (<it>TNFSF6 (FASL</it>) and <it>TRAIL</it>), chemokine receptors (<it>CX3CR1, CCR5 </it>and <it>CCR7</it>), interleukin receptors (<it>IL2RG, IL18RAB </it>and <it>IL27RA</it>) and members of secretory pathways (<it>DEGS1, FKBP11, SSR3, SEC61G </it>and <it>SLC3A2</it>) were upregulated. The expression profile suggested PI3K/AKT activation and NF-ĪŗB activation through multiple pathways (TLR/IL1R, TNF receptor induced and TCR-like possibly involving BCL10). Activation of NFAT signaling was supported by increased expression of many pathway members and downstream target genes. The transcription factor <it>GATA3 </it>was expressed in resting cells while <it>T-BET </it>was upregulated on activation concurrent with the change in cytokine expression profile. The importance of NK cells in innate immune response was also reflected by late increased expression of inflammatory chemotactic factors and receptors and molecules involved in adhesion and lymphocyte trafficking or migration.</p> <p>Conclusion</p> <p>This analysis allowed us to identify genes implicated in cellular quiescence and the cytokines and cytotoxic factors ready for immediate immune response. It also allowed us to observe the sequential immunostimulatory effects of IL2 on NK cells improving our understanding of the biology and molecular mediators behind NK cell activation.</p
    • ā€¦
    corecore